Learning narrative structure from annotated folktales
نویسنده
چکیده
Narrative structure is an ubiquitous and intriguing phenomenon. By virtue of structure we recognize the presence of Villainy or Revenge in a story, even if that word is not actually present in the text. Narrative structure is an anvil for forging new artificial intelligence and machine learning techniques, and is a window into abstraction and conceptual learning as well as into culture and its influence on cognition. I advance our understanding of narrative structure by describing Analogical Story Merging (ASM), a new machine learning algorithm that can extract culturally-relevant plot patterns from sets of folktales. I demonstrate that ASM can learn a substantive portion of Vladimir Propp’s influential theory of the structure of folktale plots. The challenge was to take descriptions at one semantic level, namely, an event timeline as described in folktales, and abstract to the next higher level: structures such as Villainy, StuggleVictory, and Reward. ASM is based on Bayesian Model Merging, a technique for learning regular grammars. I demonstrate that, despite ASM’s large search space, a carefully-tuned prior allows the algorithm to converge, and furthermore it reproduces Propp’s categories with a chance-adjusted Rand index of 0.511 to 0.714. Three important categories are identified with F-measures above 0.8. The data are 15 Russian folktales, comprising 18,862 words, a subset of Propp’s original tales. This subset was annotated for 18 aspects of meaning by 12 annotators using the Story Workbench, a general text-annotation tool I developed for this work. Each aspect was doubly-annotated and adjudicated at inter-annotator F-measures that cluster around 0.7 to 0.8. It is the largest, most deeply-annotated narrative corpus assembled to date. The work has significance far beyond folktales. First, it points the way toward important applications in many domains, including information retrieval, persuasion and negotiation, natural language understanding and generation, and computational creativity. Second, abstraction from natural language semantics is a skill that underlies many cognitive tasks, and so this work provides insight into those processes. Finally, the work opens the door to a computational understanding of cultural influences on cognition and understanding cultural differences as captured in stories. Dissertation Supervisor: Patrick H. Winston Professor, Electrical Engineering and Computer Science Dissertation Committee: Whitman A. Richards Professor, Brain & Cognitive Sciences Peter Szolovits Professor, Electrical Engineering and Computer Science & Harvard-MIT Division of Health Sciences and Technology Joshua B. Tenenbaum Professor, Brain & Cognitive Sciences
منابع مشابه
mark Alan Finlayson inferring Propp ’ s Functions from Semantically Annotated text
Vladimir Propp’s morphology of the Folktale is a seminal work in folkloristics and a compelling subject of computational study. I demonstrate a technique for learning Propp’s functions from semantically annotated text. Fifteen folktales from Propp’s corpus were annotated for semantic roles, co-reference, temporal structure, event sentiment, and dramatis personae. I derived a set of merge rules ...
متن کاملProppLearner: Deeply annotating a corpus of Russian folktales to enable the machine learning of a Russian formalist theory
I describe the collection and deep annotation of the semantics of a corpus of Russian folktales. This corpus, which I call the ‘ProppLearner’ corpus, was assembled to provide data for an algorithm designed to learn Vladimir Propp’s morphology of Russian hero tales. The corpus is the most deeply annotated narrative corpus available at this time. The algorithm and learning results are described e...
متن کاملAppraisal of Computational Model for Yorùbá Folktale Narrative
Our effort at developing computational models for African narratives, particularly those of Yorùbá folktales, is challenged by the diversity in concepts and methodologies in the discipline. This motivated us to pause and consider the various computational models of narratives in the literature. This is with a view to finding the most appropriate or otherwise adapt a closely related one for the ...
متن کاملCorpus Annotation in Service of Intelligent Narrative Technologies
Annotated corpora have stimulated great advances in the language sciences. The time is ripe to bring that same stimulation, and consequent benefits, to computational approaches to narrative. I describe an effort to construct a corpus of semantically annotated stories. I outline the structure of the corpus, a structure which colloquially can be described as a “handful of handfuls.” One handful o...
متن کاملMeasuring the Structural and Conceptual Similarity of Folktales using Plot Graphs
This paper presents an approach to organizing folktales based on a data structure called a plot graph, which captures the narrative flow of events in a folktale. The similarity between two folktales can be computed as the structural similarity between their corresponding plot graphs. This is performed using the well-known Needleman-Wunsch algorithm. To test the efficacy of this approach, experi...
متن کامل